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Abstract 



We consider the k-disjoint-clique problem. The input is an undirected graph G 
in which the nodes represent data items, and edges indicate a similarity between the 
corresponding items. The problem is to find within the graph k disjoint cliques that 
cover the maximum number of nodes of G. This problem may be understood as 
a general way to pose the classical 'clustering' problem. In clustering, one is given 
data items and a distance function, and one wishes to partition the data into disjoint 
clusters of data items, such that the items in each cluster are close to each other. Our 
formulation additionally allows 'noise' nodes to be present in the input data that are 
not part of any of the cliques. 

The fc-disjoint-clique problem is NP-hard, but we show that a convex relaxation 
qq \ can solve it in polynomial time for input instances constructed in a certain way. The 

CN ' input instances for which our algorithm finds the optimal solution consist of k disjoint 

large cliques (called 'planted cliques') that are then obscured by noise edges and noise 
nodes inserted either at random or by an adversary. 

1 Introduction 

Given a set of data, clustering seeks to partition the data into sets of similar objects. These 
subsets are called 'clusters', and the goal is to find a few large clusters covering as much 
of the data as possible. Clustering plays a significant role in a wide range of applications; 
including, but not limited to, information retrieval, pattern recognition, computational biol- 
ogy, and image processing. For a recent survey of clustering techniques and algorithms with 
a particular focus on applications in data mining see [3]. 
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In this paper, we consider the following graph-based representation of data. Given a set 
of data where each pair of objects is known to be similar or dissimilar, we consider the graph 
G = (V, E) where the objects in the given data set are the set of nodes of G and any two 
nodes are adjacent if and only if their corresponding objects are similar. Hence, for this 
representation of the data, clustering is equivalent to partitioning G into disjoint cliques. 
Therefore, for any integer k, the problem of identifying k clusters in the data containing the 
maximum number of objects is equivalent to the maximum node /c-disjoint-clique problem of 
the corresponding graph G. Given an undirected graph G = (V, E) and integer k e [1, | V|], 
the maximum node k- disjoint- clique problem refers to the problem of finding the subgraph 
K of G composed of a collection of k disjoint cliques, called a k- disjoint- clique subgraph, 
maximizing the number of nodes in K. Unfortunately, since the k = 1 case is exactly the 
maximum clique problem, well-known to be NP-hard [8], the maximum node /c-disjoint-clique 
problem is NP-hard. 

In Section [21 we relax the maximum node /c-disjoint-clique problem to a semidefinite 
program. We show that this convex relaxation can recover the exact solution in two cases. 
In the first case, presented in Section [3], the input graph is constructed deterministically 
as follows. The input graph consists of k disjoint cliques C±, . . . , C^, each of size at least 
n, plus a number of diversionary nodes and edges inserted by an adversary. We show that 
the algorithm can tolerate up to 0{n 2 ) diversionary edges and nodes provided that, for 
each i = l,...,k, each node in the clique Ci is adjacent to at most 0(min{|Cj|, \Cj\}) 
nodes in the clique Cj for each j = 1, . . . , k such that i ^ j. In Section [U we suppose 
that the graph contains a /c-disjoint-clique subgraph K and some additional nodes, and 
the remaining nonclique edges are added to the graph independently at random with fixed 
probability p. We give a general formula for clique sizes that can be recovered by the 
algorithm; for example, if the graph contains N nodes total and iV 1//4 planted cliques each of 
size Vl(N 1 / 2 ), then the convex relaxation will find them. We develop the necessary optimality 
and uniqueness theorems in Section [2] and provide the necessary background on random 
matrices in Section H~T1 

The rationale for this line of analysis is that in real-world applications of clustering, it 
is often the case that the sought-after clusters are present in the input data but are hidden 
by the presence of noisy data. Therefore, it is of interest to find cases of clustering data in 
which the clusters are hidden by noise and yet can still be found in polynomial time. 

Our analysis is related in an indirect manner to work on measuring 'clusterability' of data, 
e.g., Ostrovsky et al. [13]. In that work, the authors prove that a certain clustering algorithm 
works well if the data has k 'good' clusters. Our assumptions and analysis, however, differ 
substantially from [13J (for example, we do not require all the data items to be placed in 
clusters), so there is no direct relationship between our result and theirs. 

Our results and techniques can be seen as an extension of those in [2] from the maximum 
clique problem to the maximum node /c-disjoint-clique problem. Indeed, in the k = 1 case, 
our results agree with those presented in [2], as well as those found in earlier work by Alon 
et al. [1], and by Feige and Krauthgamer [6]. More generally, our results follow in the spirit 
of several recent papers, in particular Recht et al. [H] and Candes and Recht [5], which 
consider nuclear norm minimization, a special case of semidefinite programming, as a convex 
relaxation of matrix rank minimization. Matrix rank minimization refers to the problem 
of finding a minimum rank solution of a given linear system. These papers have results of 
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the following general form. Suppose that it is known that the constraints of the given linear 
system are random in some sense and that it is known that a solution of very low rank exists. 
Then the nuclear norm relaxation recovers the (unique) solution of minimum rank. We will 
argue that, in the case that the graph G contains a planted fc-disjoint-clique subgraph K and 
not too many diversionary edges, a rank k solution, corresponding to the adjacency matrix 
of K, of a system of linear equations defined by the input graph G can be recovered exactly 
by solving a semidefinite program. 

Like many of the papers mentioned in the previous paragraph, the proof that the convex 
relaxation exactly recovers the combinatorial solution constructs multipliers to establish that 
the combinatorial solution satisfies the KKT optimality conditions of the convex problem. 
Herein we introduce a new technical method for the construction of multipliers. In [2], the 
multipliers are constructed according to simple formulas because of the fairly simple nature 
of the problem. On the other hand, in [5], the multipliers are constructed by projection 
(i.e., solving a linear least-squares problem), which entails a quite difficult analysis. This 
paper introduces a technique of intermediate complexity: we construct the multipliers as the 
solution to a system of invertible linear equations. We show that the equations are within 
a certain norm distance of much simpler (diagonal plus rank-one) linear equations. Finally, 
the result is obtained from standard bounds on the perturbation of the solution of a linear 
system due to perturbation in its coefficients. 



2 The Maximum Node /c-disjoint-clique Problem 

Let G = (V, E) be a simple graph. The maximum node A>disjoint-clique problem 
focuses on finding k disjoint cliques in G such that the total number of nodes in these 
cliques is maximized. We call a subgraph of G composed of k disjoint cliques a "fc-disjoint- 
clique subgraph". This problem is clearly NP-hard since when k = 1 it is equivalent to the 
maximum clique problem. 

The problem of maximizing the number of nodes in a /c-disjoint-clique subgraph of G has 
the following convex relaxation: 

EN v-^7V v 
i=i Lj=i x a 

s.t. Xe < e, 

X ij = 0, V(i,j)tEs.t.i^j (1) 

trpf) = k, 

x y o 

where N = \V\ and X is an N x N symmetric matrix. The notation "X >z 0" means 
that X is positive semidefinite. Suppose that there exists a A;-disjoint-clique subgraph of G 
composed of cliques of sizes r 1; . . . , r k . Then a fc-disjoint-clique subgraph corresponds to the 
feasible solution X* e ~R NxN for §B) where 

_ ( l/rt if both i,j belong to clique £ , , 

lJ 1 otherwise. 

Moreover, the objective value of ([T|) corresponding to X* is equal to the number of nodes 
in the fc-disjoint-clique subgraph and rank(X*) = k. Using the Karush-Kuhn- Tucker condi- 
tions, we will derive conditions for which X* corresponding to a fc-disjoint-clique subgraph 
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of G, as defined by (J2J, is optimal for the convex relaxation of maximum node fc-disjoint- 
clique problem given by (fTl). In particular, these conditions are summarized by the following 
theorem. 



Theorem 2.1 Let X* be feasible for Suppose also that there exists A G R+, fi G R, 
r] G R NxN and 5eSf such that 

-ee T + Ae T + eA T + fil + ^ Vij e i e J = S, (3) 

\ T {X*e - e) = 0, (4) 
(S,X*) = 0. (5) 

Here is the cone of N x N positive semidefinite matrices, (•, •) is the trace inner product 
on Jl NxN defined by 

(Y,Z) =tr (YZ T ) 

for all Y, Z G H NxN , ej denotes the ith column of the identity matrix in R 7VxAr for all 
i — 1, . . . , N, and e is the all-ones vector in H N . Then X* is an optimal solution of 



We omit the proof of this theorem, as it is nothing more than the specialization of the 
KKT conditions [4J in convex programming to ([T]). 

2.1 Construction of the auxiliary matrix S 

Our proof technique to show that X* is optimal for ([2} is to construct multipliers to sat- 
isfy Theorem 12.11 The difficult multiplier to construct is S, the dual semidefinite matrix. 
The reason is that S must simultaneously satisfy homogeneous linear equations given by 
(S, X*) = 0, requirements on its entries given by the gradient equation ([3]), and positive 
semidefiniteness. 

In this subsection, we will lay the groundwork for our definition of S; in particular, we 
construct an auxiliary matrix S. The actual multipliers used to prove the optimality of X*, 
as well as the proof itself, are in the next subsection. 

Our strategy for satisfying the requirements on S is as follows. The matrix S will be 
constructed in blocks with sizes inherited from the blocks of X*. In particular, let the nodes 
contained in the k planted cliques be denoted Ci, . . . ,Ck, and let the remaining nodes be 
Ck+i- Then according to ([2]), X* has diagonal blocks Xq c for q = 1, . . . , k consisting of 
multiples of the all l's matrix. The remaining blocks of X* are 0's. The diagonal blocks of S 
will be perturbations of the identity matrix, with the rank-one perturbation chosen so that 
each diagonal block of S, say Sc q ,c q is orthogonal to Xc q ,c q - 

The entries of an off-diagonal block, say So q ,c s must satisfy, first of all, ([3]). This con- 
straint, however, is binding only on the entries corresponding to edges in G, since entries 
corresponding to absent edges are not constrained by ([3]) thanks to the presence of the un- 
bound multiplier 77^ on the left hand side. These entries that are free in ([3]) are chosen so 
that ([5]) is satisfied. It is a well known result in semidefinite programming that the require- 
ments (S,X) = 0, X, S G together imply SX = XS = 0. Thus, the remaining entries 
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of S must be chosen so that X*S = SX* = 0. Because of the special form of X*, this is 
equivalent to requiring all row and column sums of Sc q! c a m ust equal zero. 

We parametrize the entries of Sc q ,c s that are not predetermined by ([3]) using the entries 
of two vectors y q ' s and zfl ,s . These vectors are chosen to be the solutions to systems of linear 
equations, namely, those imposed by the requirement that X*S = SX* = 0. We show that 
the system of linear equations may be written as a perturbation of a linear system with 
a known solution, and we can thus get bounds on y q ' s and z q,s . The bounds on y q ' s and 
z q,s in turn translate to bounds on \\Sc ,c s \\, which are necessary to establish the positive 
semidefmiteness of S. This semidefmiteness is established by proving that the diagonal 
blocks, which are identity matrices plus rank-one perturbations, dominate the off-diagonal 
blocks. 

Recalling our notation introduced earlier, G = (V, E) has a fc-disjoint-clique subgraph K 
composed of cliques C\, C 2 , ■ ■ ■ , Cfc of sizes r 1; r 2 , . . . , r k respectively. Let Ck+i := V\(U^ =1 Ci) 
be the set of nodes of G not in K and let r^+i := |Cjt + i|. Let N := \V\. Let f : = 
min{ri, r 2 , . . . , r^}. For each v G V , let n s v denote the number of nodes adjacent to v in C s 
for all s G {1, . . . , k + 1}, and let cl(v) denote index i e {1, . . . , k + 1} such that v G Cj. 

Let A(G) G R 7Vx7V be the adjacency matrix of the complement G of G; that is L4(G)]jj = 
1 if 4- E and otherwise. Next, fix q, s G {1, . . . , k + 1} such that q ^ s. Let 

H = H q s G J\pi xCs be the block of A(G) with entries indexed by the vertex sets C q and C s , 
and let D = D q s G R q 9 be the diagonal matrix such that, for each i G C q , the (i,i)th 
entry of D is equal to the number of nodes in C s not adjacent to i. That is 



where G R 9 is the vector with zth entry equal to nf for each i G C q . Similarly, let 
F = F q s = r q I — Diag (n^, ). Next, define the scalar 



Next, for each q, s — 1, . . . , k + 1 such that q ^ s let b = h q,s G H, c i uCs be defined by 



is weakly diagonally dominant since the ith row of H contains exactly r s — nf l's, and, hence, 
positive semidefinite. Further, let y = y q,s and z = z q,s be a solution of the perturbed system 



D = rj- Diag (nM 




= c ■ 



nf, if % G C q 
n 9 , Hie C s . 



Note that the matrix 





(6) 



for some scalar 6 > to be defined later. 
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The rationale for this system of equations ([6]) is as follows. Below in (fl2l) . we shall 
define the matrix Sc ,c s according to the formula that entries (i, j) corresponding to edges 
in E are set to — c g>s , while entries corresponding to absent edges are set to the sum 
[y 9,s ]i + [ z9,s ]i- Matrix Sc ,c s has the same formula; refer to ffl9|) below. 

As mentioned earlier, it is required that all row and column sums of Sc q ,c s equal zero. 
Consider, e.g., the sum of the entries in a particular row i G C q . This sum consists of r s 
terms; of these terms, of them are — c q>s (corresponding to edges from i to C s ) while the 
other r s — have the form [y q ' s ]i + [z q ' s ]j- Thus, the requirement that the row sum to zero 
is written 

-ntc q>s + £ ([yn + in) = o 

which may be rewritten 

jeCs;(i,jHE 

Equation <£7§ is exactly a row of in the case 9 = because of the formulas used to define 
D, F, H, b. 

In the case that 9 is not zero, the equation for the zth row in has an additional term of 
the form 9(e T y q ' s — e T z q,s ). This additional term does not affect the result, as the following 
argument shows. The version of ([6]) with 9 = is singular because the vector (e; — e) is in 
its null space. This corresponds to adding a scalar to each entry of y q ' s and subtracting the 
same scalar from each entry of z q,s . One particular way to fix that scalar is to require that 
the sum of entries of y q,s equals the sum of entries of z q,s , i.e., 

e V s - e T z q ' s = 0. (8) 

If for some 9 > we are able to show that (JH]) is nonsingular (which we shall establish in 
Section [3] and again in Section @J then this particular (y 9,s , z q ' s ) satisfying ([7]) and ([8]) will 
also be a solution to fl6]) for nonzero 9 since the additional term 9(e T y q,s — e T z p ' q ) is zero. 

For the remainder of this section, in order to formulate definitions for the remaining 
multipliers, assume that 9 > and that ([6]) is nonsingular. Furthermore, assume that 
Da > for all i G C q and F^ > for all i G C s . Let 

D + 9ee T \ / H — 9ee T 



then we have assumed that A + P is nonsingular, and 

j)=(A + P)- 1 b. 

The proof technique in Sections [3] and H] is to show that Q := (A + P)^ 1 — A^ 1 is small so that 
(y,z) is close to A~ l h. Let Q = (Qj,Q%) T where Q 1 G R c 9 x(c,uc s ) and q 2 g r c s x(c„uc s )_ 
Then, under this notation, 



(F + 9ee T y 1 J \b 2 J r \Q 



>2 
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where b x G H Cq , b 2 G R Cs are the vectors of entries of b corresponding to C q and C s 
respectively. Therefore, if D, F and A + P are nonsingular, 

y = (D + 0ee T ) _1 bi + Qib = (J + 0Zr 1 ee T )- 1 Zr 1 bi + Q x h 

and 

z = (I + 0F- 1 ee T )- 1 F- 1 b 2 + Q 2 b. 

Let y := y — Qih and z := z — Q 2 b. In order to give explicit formulas for y and z we use the 
well-known Sherman- Morrison- Woodbury formula (see, for example, [TOj Equation 2.1.4]), 
stated in the following lemma, to calculate (J + OD^ee 7 )^ 1 and (I + 9F~ 1 ee T )~ 1 . 

Lemma 2.1 If A is a nonsingular matrix in R nxn and u, v G R n satisfy v T A~ 1 u ^ — 1 

As an immediate corollary of Lemma I2.1[ notice that 

y = " T-^-FT^TT ) b x = D ( / — ^-^^ ) b x . (10) 

and 



l + 9e T D- 1 e 1 \ 1 + 9e T D~ 1 e 



j 0F _1 ee T F -1 \ , , / 9ee T F~ 1 \ , s 

Finally, we define the (k + 1) x (k + 1) block matrix 5 G R iVx7V as follows: 
(*0 For all 9 G {1, ... , fc}, let S Cq ,c q = 0. 
(<t 2 ) For all q, s G {1, . . . , k} such that q ^ s, let 

S Cq ,c s = H Q;S o (y^e T + e(z<^f ) + c,, s (^, s - ee T ). (12) 

(cr 3 ) For all 5 G {1, ... , A;} and z G C ? , j G C^+i, let 

-c q , k +i, if (i,j) EE 



[Sc q ,c k+1 ]ij - [Sc k+1 ,c q ]ji 
(<t 4 ) Finally, for all i,j G Ck+i, choose 



c q ,k+i n q J (r q -n q X otherwise. 



ro i _ J -1, if (hj) e E or i = j 

L^c^A+iJii - i 7) if ^ £ 



for some scalar 7 to be defined later. 
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We make a couple of remarks about (02). As already noted earlier, this formula defines 
entries of Sc qt c a to be — c q>s in positions corresponding to edges, and [y q ' s ]i + [z q ' s ]j in other 
positions. The vectors y q ' s and z q ' s are defined by precisely so that, when used in this 
manner to define Snc a , its row and column sums are all (so that X*S = SX* = 0; 
the relationship Sc q ,c s = Sc q ,c a is given by f|T9|) below). The system is square because the 
number of constraints on Sq B imposed by X*S = SX* = after the predetermined entries 
are filled in is \C q \ + \C S \ (one constraint for each row and column), which is the total number 
of entries in y q ' s and z q ' s . As mentioned earlier, there is the slight additional complexity that 
these \C q \ + \C 8 \ equations have a dependence of dimension 1, which explains why we needed 
to regularize ([6]) with the addition of the 8ee T terms. 

As second remark about <5" 2 , we note that Sc q ,c a — ^c a ,c q - This is a consequence of our 
construction detailed above. In particular, y q,s = z s ' q , H qs = Hj ' and D QyS = F Stq for all 
q, s = 1, . . . , k such that q 7^ s. 



2.2 Definition of the multipliers, optimality and uniqueness 

We finally come to the main theorem of this section, which provides a sufficient condition for 
when the fc-disjoint-clique subgraph of G composed of the cliques Ci, . . . , Cfe is the maximum 
node /c-disjoint-clique subgraph of G. 

Theorem 2.2 Suppose that G = (V, E) has a k- disjoint- clique subgraph G* composed of the 
disjoint cliques C\, . . . , Ck and let Ck+i :=V \ (U* =1 Cj). Let = |Cj| for alii — 1, . . . , k + 1, 
and let f = min i=1 ....^{r^}. Let X* be the matrix of the form (TJ|) corresponding to the k- 
disjoint- clique subgraph generated by Ci, . . . , Ck- Moreover, suppose that the matrix S as 
defined by (ax), ... , (<7 4 ) satisfies 

||5||<f-l. (14) 

Then X* is optimal for (T7]j ; and G* is the maximum node k- disjoint- clique subgraph of G. 
Moreover, if \\S\\ < r — 1 and 

n% < T Q ( 15 ) 

for all v G V and q 6 {1, . . . , k} — cl(v) then X* is the unique optimal solution of ([2D and 
G* is the unique maximum node k- disjoint- clique subgraph of G. 

Proof: We will prove that (fl4l) is a sufficient condition for optimality of X* by defining 
multipliers /1, A, rj, and S and proving that if (|T4l) holds then these multipliers satisfy the 
optimality conditions given by Theorem 12.11 Let us define the multipliers fi and A by 

/jL = f = mm{r 1 ,r 2 ,...,rk}, (16) 
Aj = ^ ~ r J Tq) for all i G C q , (17) 

for all q — 1, . . . , k and 

Xi = 0. (18) 
for all i G Ck+i- Notice that by our choice of \i and A we have 

S Cq ,c„ =rl - {f/r q )ee T 
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for all q = 1, . . . , k by (J3]). Moreover, we choose r\ such that 

_ f S i:j -Xi- Xj + 1, if (£ E,i=£ j 
tJ \ 0, otherwise 

for all z, j G V. Note that, by our choice of rj, we have 

<y f ^c,.c s , fo r all q,s G {1, . . . ,fc + 1} such that q ^ s , . 

Q ' Cs I 5 Cfe+1)Cfc+1 + r/, forg = a = * + l. 1 J 

by©. 

By construction, fi,\,rj, and 5 satisfy (J3]). Since the ith row sum of X* is equal to 1 
for all i G C q for all q — 1, . . . k and is equal to for all i G Ck+i, X* and A satisfy the 
complementary slackness condition (jlj). Moreover, 

<**■ s > = E f E E f = E r f 1 " r - (r)) = °- 

,=i \iec qj ec q r i ) g=i r i v v«/ ' 

and thus X* and 5 satisfy (jSJ). It remains to prove that ffl4|) implies that S is positive 
semidefinite. 

To prove that S is positive semidefinite we show that x^x > for all x G R N if 5 
satisfies (TH]) . Fix x G R w and decompose x as x = xi + x 2 where 



fae, ie{l,...,A} 
0, i = k + l 



for ^ G R fc chosen so that x 2 (Cj) T e = for i — 1 . . . , k, x 2 (Cfc + i) = x(CV+-i)- Then, by our 
choice of xi and x 2 , 

x T Sx = x^Sx 2 

= r\\MCi U • ■ ■ U C k )\\ 2 + (f - l)||x 2 (C fc+1 )|| 2 + xpx 2 
>(f-l-||5||)||x 2 || 2 . 



Therefore, S is positive semidefinite, and, hence, X* is optimal for ([I]) if H^H < r — 1. 

Now suppose that \\S\\ < f — 1 and, for alii = 1, . . . , k, no node in Cj is adjacent to every 
node in some other clique. Then X* is optimal for (1T1) . For alH = 1, . . . , fc, let Vj G R w be 
the characteristic vector of Cj. That is, 



[v 



l, ifj'eC, 

0, otherwise. 



Notice that X* = y^ =1 (l/rj)vj(vj) T . Moreover, by complementary slackness, (X*,S) = 
and, thus, v$ is in the null space of S for all i = 1, . . . , k. On the other hand, consider 
nonzero x G R N such that x T Vj = for all i — 1, . . . , k. That is, x is orthogonal to the span 
of {vj : i = 1, . . . ,k}. Then 

x T Sx = fHd U • • • U C fc )|| 2 + (f - l)||x(C fc+1 )|| 2 + x T £x > (f - 1 - ||£||)||x|| 2 > 0. 
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Therefore, Null(S') = spanjvj : i = 1, . . . , k} and rank (S) = N — k. 

Now suppose that X is also optimal for ([I]). Then, by complementary slackness, (X, S) = 
which holds if and only if XS = 0. Therefore, the row and column spaces of X lie in the 
null space of S. It follows immediately, since X y 0, that X can be written in the form 



i=l i=l j=l 



r 



for some o G and well', where S fc denotes the set of k x k symmetric matrices. Now, 
if Uij 7^ for some i ^ j then every entry in the block X(Ci,Cj) = X(Cj,Ci) T must be 
equal to Uij. Since each of these entries is nonzero, this implies that each node in Cj is 
adjacent to every node in Cj, contradicting Assumption (TT5"j) . Therefore, X has singular 
value decomposition X = oiVivf + • • • + cr^v/cV^ . Moreover, since X is optimal for (p]) it 
must have objective value equal to that of X* and thus 

k N N N N k 

E = E E x h = E E ** = E < 2 °) 

j=l i=l j'=l i=l j=l i=l 

Further, since X is feasible for (jTJ), 

tr^ < 1 (21) 

for all i = 1, . . . , k. Combining (|20|) and (12 ip shows that Oi = l/r< for alH = 1, . . . , and, 
hence, X = X* as required. ■ 



3 The Adversarial Case 

Suppose that the graph G = (V, E) is generated as follows. We first add k disjoint cliques 
with vertex sets Ci, . . . , of size r 1; r 2 , . . . , r k respectively. Then, an adversary is allowed 
to add a set Ck+i of additional vertices and a number of the remaining potential edges 
to the graph. We will show that our adversary can add up to 0(f 2 ) noise edges where 
f = min{r 1; . . . , r k } and the /c-disjoint-clique subgraph composed of Ci, . . . , will still be 
the unique maximum fc-disjoint-clique subgraph of G. 

The main theorem concerning the adversarial case is as follows. 

Theorem 3.1 Consider an instance of the k- disjoint- clique problem constructed according 
to the preceding description, namely, G contains a k- disjoint- clique graph G* whose nodes 
are partitioned as C\ U • • • U C k where \C q \ = r q , q = 1, . . . , k, plus additional nodes denoted 
Ck+i and additional edges (which may have endpoints chosen from any ofCi, . . . , Ck+i)- Let 
f = min(ri, . . . ,r k ). Assume the following conditions are satisfied: 

1. For all q — 1, . . . ,k, i G C q , for all s G {1, . . . , k + 1} — q, 

n\ < 5mm(r q ,r s ). (22) 
Here, 5 is a scalar satisfying ( Il3j) below. 
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2. 

\E(G)\E(G*)\< P r 2 , 
where p is a positive scalar depending on 5. 

Then X* given by (0) is the unique optimal solution to and G* is the unique optimal 
solution of the k-disjoint clique problem. 

We remark that two of the conditions imposed in this theorem are, up to the constant 
factors, the best possible according to the following information-theoretic arguments. First, 
if n\ = r s , then node i could be inserted into clique s, so the partitioning between C s 
and C q would no longer be uniquely determined. This shows the necessity of the condition 
n\ < 0{r s ). The condition that \E{G) \ E{G*)\ < pr 2 is necessary, up to the constant factor, 
because if \E(G) \ E(G*)\ > f 2 (f 2 — l)/2, then we could interconnect an arbitrary set of f 
nodes chosen from among the existing cliques with edges to make a new clique out of them, 
again spoiling the uniqueness of the decomposition. 

An argument for the necessity of the condition that nf < 5r q is not apparent, so possibly 
there is a strengthening of this theorem that drops that condition. 

The remainder of this section is the proof of this theorem. As might be expected, the 
proof hinges on establishing (fill) ; once this inequality is established, then Theorem 12.21 
completes the argument. 

As before, let r^+i denote |Cfc+i|. For the remainder of the proof, to simplify the notation, 
we assume that r^ +1 < 2pf 2 . The reason is that since \E(G)\E(G*)\ < pr 2 by assumption, if 
r k+1 > 2pr 2 then C& +1 would include one or more isolated nodes (i.e, nodes of degree 0), and 
these nodes can simply be deleted in a preliminary phase of the algorithm. (The algorithm 
still works with an arbitrary number of isolated nodes in G \G*, but the notation in the 
proof requires some needless additional complexity.) 

Recall that the construction of the multipliers presented in Section [2] depended on two 
scalars 6 in (jH]) and 7 in (fT3"j) : choose 6 = 1 and 7 = 0. 

We impose the assumption that 5 £ (0,0.382). The constant 0.382 is chosen so that 

< 8 < (1 - 5) 2 . (23) 

We will show that, under this assumption, there exists some /3 > depending only on 5 such 
that 

\\S Cq ,cA 2 < P\\ hq '% 

for all q, s £ {1, . . . , A;} such that q^ s. 

Choose q,s £ {1, . . . , k} such that q 7^ s and let D, E, H, b, and c be defined as in 
Section I2TT1 Without loss of generality we may assume that r q < r s . Moreover, let y and z 
be the solution of the system (JS)) and define A, Q, P as in Section 12.11 

We begin by showing that, under this assumption, y and z are uniquely determined. 
Note that, since = r s — Da < 5r s for all i £ C q and n\ = r q — En < 5r q for all % £ C s by 
Assumption (1221) . D and F are nonsingular and hence A is nonsingular. Moreover, 

A + P = A(I + A~ X P) 
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and, hence, A + P is nonsingular if \\A 1 P\\ < 1. Note that, for all t > 0, we have 

Km(D + tee T ) > X min (D) = min A; (24) 

l&Cq 

since ee T >: where X m [ n (D + tee T ) is the smallest eigenvalue of the symmetric matrix 
D + tee T . Taking t = 1 in 023]) shows that 

||(£> + ee r n| < Hirl = , 1 < — -i— - (25) 

m%c, Ai (1 - o)r fl 

since, for each z E C 9 , we have 

(1 - o> s < D u < r s 

by Assumption (12"2"]1 . Similarly, 

IKF + ee 7 )- 1 )! < (IF- 1 )! = — < 1— - . (26) 

" l ' "I -II II min jeCs " (l-5)r q K } 

Combining ( 125]) and (126]) we have 

II 4-111 = \ < \ = 1 (07) 

11 11 minjlKD + ee^lUKF + ee 7 )- 1 !!} " (1 - 5) min{r„ r fl } (1 - 6)r q 1 ' 

On the other hand, 

\\P\\ = \\H-ee T \\<\\H-ee T \\ F = \^2^2(H tJ -l) 2 \ < VSr q (28) 

\ieCq jeCs J 

since iiy — 1 is equal to —1 in the case that E -E and otherwise and there at most br 2 q 
edges between C q and C s by Assumption ff22l . Therefore, since 5 < (1 — 5) 2 by Assumption 
( 123]) . we have 

||^ 1 J P||<||^ 1 |||| J P||<^:<1 

1 — 

and, thus, A + P is nonsingular and y and z are uniquely determined. 
Now, recall that 

In order to calculate an upper bound on Ipc-.cJI we write Sc q ,c a as 



S Cq ,c. = H o (ye T + ez T ) - c(ee T - H). 



Sc q ,c s = m i + m 2 + m 3 + m 4 + m 5 (29) 

where 



mi := H o (ye T ), m 2 :— H o (ez T ), m 3 :— H o (Qibe T ), 
m 4 := if o (e(Q2b) T ), m 5 := — c(ee T — ii) 

and apply the triangle inequality to obtain 



(30) 
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\S Cq ,c.\\ <^IK||- (31) 



i=l 

12 



Throughout our analysis of HScg.cJI we will use the following series of inequalities. For any 
W E R mxn , u E R m and v E R n ', we have 

\\W ouv T || = ||Diag(u) • W-Diag(v)|| < ||Diag (u)|| ||Diag (v)|| ||W|| 

= ||u||oo||v||oo||^||. (32) 

On the other hand, 

||Wouv T || = ||Diag(u) • W-Diag(v)|| 

< HvlloollDiagCu) ■ W\\ < llvlUllDiag^) ■ W\\ F 

/ m \ V2 

= l|v||oof^u 4 2 ||^,:)|| 2 j (33) 

< HllvlU max ||W(z,:)|| (34) 

i=\,...,m 

and 

||Wouv T || < HloollvH max (35) 

j=l,...,n 

where W(i, :) and W(:,j) denote the zth and jth row and column of W. 

We begin with Applying the bound (|34|) with W = H, u = y, and v = e we have 

|| m i|| 2 < max.Djj||y|| 2 . (36) 

1 /2 

Here, we used the fact that max igC - 9 \\H(i, :)|| = maXj gGj since the zth row of H contains 
exactly r s — n\ l's. Thus, since 

||y|| < IKD + ee^-Ulb,!! < . I|bl " < -^L 

mm ieCq Da (1 - 6)r s 

it follows immediately that 

l|mi|| 2 < 1 Hbxf (37) 
(1 - 6) 2 r s 

since Da < r s for all i E C q . By an identical calculation, we have 

1MI 2 < tz * 2 ||b 2 || 2 . (38) 
(1 - S) 2 r q 

Next, applying (|34|) with W = if, u = Qib, v = e yields 

||m 3 || 2 < max Ai||Qib|| 2 < r s ||Qib|| 2 < r 8 \\Q x || 2 ||b|| 2 . 

^&C q 

since maXjDjj < r s . To derive an upper bound on ||?r?.3|| 2 we first derive an upper bound on 

HQill- 

Note that 

oo 

Q = (A + P)- 1 - A' 1 = {{I + A^P)- 1 - I)A- 1 = Y^-A^PfA- 1 (39) 
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since (/ + X) 1 = Y2^o(~-^Y f° r an X such that ||X|| < 1 by Taylor's Theorem. Notice 
that 

A \P2 

where 

P 1 = (D + 6ee T )-\H - 6ee T ), P 2 = (F + Oee T )-\H T - Oee T ) 
It follows immediately that 



£=0 



(PP 2 ) m 
(P 2 P 



) V/ o (WMWi 
) + V (P^)*p 2 o J J A 



since, for any integer i > 1 



Pi 
P 2 



Therefore, 



f / (PP 2 ) £ / 2 



o (p 2 Pi)^ /2 ; ' 
o (p 1 p 2 Y £ - i y 2 p 1 



{ V (p 2 Pi) ( ^ 1)/2 p 2 



if £ even 



if I odd. 



(40) 



HQill <\\(D + Oee^W ^ HPP.f + HPH ||(P + ^ee^)" 1 ]! ^ ||pP 2 f (41) 



e=i 



£=0 



and 



IIQ2II < \\(F + eee T )- 1 \\J2\\PiP2\\ i +\\P2\\\\(D + eee T )- 1 \\J2\\PiP2\\ e - (42) 



£=1 



Substituting (P25D, flU} and (FJSD into gU yields 



5 i/ 2 ~ 



< c/v^I (43) 



where 



since 



2max{5/(l -j) 1 vg} 
(l-5) 2 -5 



IP1P2II < llP-ee^llip-lUP- 1 !! < 



(1-5)2- 

Note that Assumption (123]) ensures that the infinite series in (j43p converge. It follows that 



X2 



ll^3|| 2 <- 



(44) 
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On the other hand, 



< c/r q 



since yjr q r s > r q . Thus, applying fl35|) with W = H, u = e, v = Q 2 b we have 



IKir<rJQ 2 || 2 ||b|| 2 <-||b|| 2 . 

r„ 



Finally, 



||m 5 || 2 = \\c(H-ee T )f < \\c(H-ee T ) || 2 = £(i^-ee T ) 2 = c ]T nf = H^l 
Therefore, there exists G R such that 

'lb" 2 



ii^ A ir<^^+iNi 

where /3 depends only on 5. Moreover, since ||b|| 2 < HbHiUbHoo and 

|| b> || oo = c • max |maxn^, max/i?}! < 5cmm{r q , r s } = 5cr q 

by Assumption f )22l) . there exists depending only on 5 such that 

\\S Cg ,c s \\ 2 < P\\b\\i 

as required. 

Next, consider Sc„,c k+1 f° r some q G {!,..., A;}. Recall that 



[5, 



C q ,C k+1 \ij 



-c, if(i,j)e£ 
en,/ (r 9 —rij), otherwise 



where rij = n* is the number of edges from j G Ck+i to C g for each j G C& + i- Hence, 
\\Sc q l| 2 <ll^c 



E ^c 2 + (r g - rij) ( 



n jC 



r q - rij 



6 rij 



c 2 



1-5 



\E(C q , C k+ i) 
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where E(C g , Cfc+i) is the set of edges from C q to C k +i- Similarly, by our choice of 7 = in 
(04), we have 

\\Sc k+1 ,c k+1 \\ = \\Sc k+1 ,c k+1 - rl\\ 2 < \\S Ck+1< c k+1 - rI\\ 2 F = r k+l + 2\E(C k+1 ,C k+1 )\. (52) 

Let B be the vector obtained by concatenating b q,s for all q, s £ {1, . . . ,k}. Then, there exist 
scalars ci, £2 G R depending only on 5 such that 

fc+1 fc+1 fc 

y^y^ ii^g g ,c 3 ii 2 = 11^,^11 2 + 11^,^+1 11 2 + h^+u+i ~^n 2 

9=1 8=1 g,se{l,...,fe} 9=1 

k+l 

< c 1 \\B\\ 1 + c 2 ^2\E(C q ,C k+1 )\ +r k+1 
9=1 

by (EHD, d5U) and It follows that, since ||b«' s ||i < \E(C q ,C s )\ for all q, s £ {I, . . . , k} 

such that s, there exists c 3 > depending only on 6 such that 

fc+i fc+i 

Yl Yl fe,c, f < c 3 R + r fe+1 

g=l s=l 

where i? := E(G*) | is the number of edges of G not contained in the fc-disjoint-clique 

subgraph G* composed of Ci, . . . ,C k . The hypothesis of the theorem is that R < pf 2 . We 
have also assumed earlier that r k +\ < 2pf 2 . Hence the sum of the squares of the 2- norms of 
the blocks of S is at most (£3 + 2)pf 2 . Therefore, there exists some p > depending only on 
S such that the preceding inequality implies \\S\\ < f — 1. This proves the theorem. 

4 The Randomized Case 

Let Ci, C 2 , . . . , Cfc+i be disjoint vertex sets of sizes r%, . . . , r fc+1 respectively, and let V = 
U^/Cj. We construct the edge set of the graph G = (V, E) as follows: 

(Qi) For each q — 1, . . . , k, and each i £ C q , j £ C q such that z 7^ j we add (i, j) to i£. 

(f2 2 ) Each of the remaining possible edges is added to E independently at random with 
probability p £ (0, 1). 

Notice that, by our construction of E, the graph G = (V, E) has a fc-disjoint-clique sub- 
graph G* with cliques indexed by the vertex sets Ci, . . . , C k . We wish to determine which 
random graphs G generated according to (fii) and (f^) have maximum node fc-disjoint- 
clique subgraph equal to G* and can be found with high probability via solving (TTJ. We 
begin by providing a few results concerning random matrices with independently identically 
distributed (i.i.d.) entries of mean 0. 
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4.1 Results on norms of random matrices 



Consider the probability distribution f2 for a random variable x defined as follows: 

_ J 1 with probability p, 

\ —pj (1 — p) with probability 1 — p. 

It is easy to check that the mean of x is and the variance of x is a 2 = pj (1—p). In this section 
we provide a few results concerning random matrices with entries independently identically 
distributed (i.i.d.) according to Q. We first recall a theorem of Geman [9] which provides 
a bound on the largest singular value of a random matrix with independently identically 
distributed (i.i.d.) entries of mean 0. 

Theorem 4.1 Let A be a \yn\ x n matrix whose entries are chosen according to Q for fixed 
y G R+. Then, with probability at least 1 — c\ exp(— C2n C3 ) where C\ > 0, C2 > 0, and C3 > 
are constants depending on p and y, 

\\A\\ < c^y/n 

for some constant C4. 

Note that this theorem is not stated exactly in this form in [9], but can be deduced 
by taking k = n q for a q satisfying (2a + A)q < 1 in the equations on pp. 255-256. A 
similar theorem due to Fiiredi and Komlos [7] exists for symmetric matrices A with entries 
distributed according to Q. 

Theorem 4.2 For all integers i,j,l<j<i< n, let Aij be distributed according to Q. 
Define symmetrically A^ = Aji for all i < j . 

Then the random symmetric matrix A = [A^] satisfies 

\\A\\ < 3ct v 7 ™ 

with probability at least to 1 — exp(— era 1 / 6 ) for some c > that depends on a. 

As in Theorem 14. 11 the theorem is not stated exactly in this manner in [7]; the stated form 
of the theorem can be deduced by taking k = [a j KY^n 1 ^ 6 and v = ay/n in the inequality 

P(max |A| > 2a\/n + v) < \/nexp(— kv/(2\fn + v)) 

on p. 237. 

Next, we state a version of the well-known Chernoff bounds which provides a bound on 
the tail distribution of a sum of independent Bernoulli trials (see p21 Theorems 4.4 and 4.5]). 

Theorem 4.3 (Chernoff Bounds) Let X 1; . . . , be a sequence ofk independent Bernoulli 
trials, each succeeding with probability p so that E{Xj) = p. Let S = Yli=i-^i ^ e ^ e binomi- 
ally distributed variable describing the total number of successes. Then for S > 

(TtM • (53 » 

It follows that for all a G (O^py/k), 

P{\S-pk\ > aVk) < 2exp(-a 2 /p). (54) 
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The final theorem of this section is as follows (see [2j Theorem 2.4]). 

Theorem 4.4 Let A be an n x N matrix whose entries are chosen according to Q. Suppose 
also that elogiV < y/n. Let A be defined as follows. For (i,j) such that A^ = 1, we define 
Aij = 1. For entries such that Ay = —p/(l —p), we take A^ = —rij/(n — n>j), where 
Uj is the number of 1 's in column j of A. Then there exist c x > and C2 G (0, 1) depending 
on p such that 

P(\\A - A\\ 2 F < c x N) > 1 - (2/3)" - N%. (55) 



4.2 A bound on ||5|| in the randomized case 

Suppose that the random graph G = (V, E) containing fc-disjoint-clique subgraph G* com- 
posed of cliques Cx, . . . , Ck is constructed according to (Oi) and (f2 2 ) with probability p. Let 
N ■= \V\, C k+1 = V\ U^ =1 Ci and n = \C { \ for all % = 1, . . . , k + 1. Further, let 9 = 1 -p in 
dH]) and 7 = p/(l — p) in f[T3^) . We begin by stating the main theorem of the section. 

Theorem 4.5 Suppose that G = (V,E) has a k- disjoint- clique subgraph G* composed of the 
cliques Cx, ■ ■ ■ , Cfc and let Ck+i := V \ (U* =1 (7»). Let r« = \Gi\ for all i = 1, . . . , k + 1 and 
suppose that r q < f 3 / 2 /or aZZ g = 1, 2, . . . , where f = min^i^. ^jrj}. Then there exists 
some flx, p2 > depending only on p such that 

( k \ 1 ' 2 ( k lV /2 
\\S\\<Pi\J2r 2 sj \J2-J +^ (56) 

with probability tending exponentially to 1 as f approaches oo. 

This theorem is meant to be used in conjunction with Theorem 12.21 In particular, if the 
right-hand side of (IoIjI) is less than f — 1, then the planted graph G* may be recovered. 

It is clear from (fill) and the second term on the right-hand side of (|56|) that the fc-disjoint- 
clique subgraph cannot be found unless iV < 0(f 2 ). We now give a few examples of values 
for rx, ■ ■ ■ , r k+1 that fulfill (|56|) . 

1. Consider the case k = 1, i.e., a single large clique. In this case, taking rx = const • N 1 ^ 2 
satisfies (fl4|) since the first term on the right is 0(N 1 ^ 4 ). 

2. Suppose k > 1 and r x = . . . = = const • iV a . In this case, The first parenthesized 
factor on the right in (!56l) is 0(k 1 ^ 2 N a ) while the second is 0(k 1 ^ 2 N^ a ^ 2 ), and therefore 
the first term is 0(kN a ^ 2 ). For (fT4"l) to hold, we need this term to be 0(f) = 0(N a ), 
which is valid as long as k < N a l 2 . We also need a > 1/2 as noted above to handle the 
second term on the right. For example, for a = 1/2 the algorithm can find as many as 
iV 1 / 4 cliques of this size. For a = 2/3, the algorithm can find as many as N 1 ^ 3 cliques 
of this size, which is the maximum possible since the cliques are disjoint and iV is the 
number of nodes. 
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3. The cliques may also be of different sizes. For example, if there is one large clique 
of size 0(iV 2 / 3 ) and iV 1 / 6 smaller cliques of size 0(N 1 / 2 ), then f = 0{N 1 / 2 ), the first 
parenthesized factor in (1561) is N 2 / 3 while the second is iV -1 / 6 , so the entire first factor 
is OiN 1 ' 2 ) = 0(f). 

We note that the results for random noise in the fc-disjoint-clique problem are much 
better than the results for adversary-chosen noise. In the case of adversary-chosen noise, the 
number of allowable noise edges is bounded above by a constant times the number of edges 
in the smallest clique. In the case of random noise, the number of allowable noise edges is 
the square of that quantity (e.g., if there are iV 1 / 4 cliques each of size N 1 ^ 2 , then the smallest 
clique has N edges versus N 2 noise edges). 

We do not know whether the bound in (tHj) is the best possible. For instance, there is no 
obvious barrier preventing the algorithm from recovering as many as N 1 ^ 2 planted cliques of 
size N l l 2 in a random graph, but our analysis does not carry through to this case. 

The remainder of this section is devoted to the proof of Theorem 14.51 We write S as 

S = S\ + §2 + S3 + St± + Sj^ 
where Si G H NxN , i — 1, ... ,4 are (k + 1) by (k + 1) block matrices such that 

s (c c) = {^( Cq,Cs ^ if q,se { 1 >-"' fc }' q ^ s 

9 ' s \ 0, otherwise 



S2{C g , C s 



R(C q ,C s ), if g,se{l,...,k} 

S{C q ,C k+1 ), if s = fe + 1 

S{C k+l ,C s ), if q = k + l 

S(C k+ i,C k+1 ), iiq = s = k + l 

-R(C q ,C a ), if q,se{l,...,k} 
0, otherwise 



SIC C ) = { S ( C n C k+i) - S(C q ,C k+ i), if s = k + 1, q G {l,...,k} 
91 s [0, otherwise 

where R G R 7Vx7V is a symmetric random matrix with independently identically distributed 
entries such that 

^ ( — 1, with probability p 

%3 \ p/0- ~ P)i with probability 1 — p 

and S G R NxN such that 

§ = f -1, S(i,j)eE 

y ' 1 p/{^-~p)i otherwise. 
Notice that, by Theorem 14. 2} there exists some k±, K2, k 3 > such that 

P(\\S 2 \\ + ||5 3 || > KiVn) < K 2 exp(-K 3 N 1 ^). (57) 

Moreover, by Theorem I4.4[ there exists K4 > and kq, G (0, 1) such that 

P(||£ 4 || > K 4 v / iV) < k£ + Nk£. (58) 
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Hence, there exists some scalar /3 4 depending only on p such that 

||5|| < W^w + ^Vn 

with probability tending exponentially to 1 as f — > oo. It remains to prove that 

| Si || <()' ' ' ' " 1 ~~ 

with probability tending exponentially to 1 as f approaches oo. 

To do so, consider two vertex sets C q and C s such that q ^ s. Without loss of generality 
we may assume that r q < r s . Define H = H qs , D = D qjS , F = F q>s , b = b qiS , c = c ?iS , 
y, z, A, and P as in Section 12.11 The following theorem provides an upper bound on the 
spectral norm of S(C q , C s ) for q ^ s, that holds with probability tending exponentially to 1 
as f approaches oo. 

Theorem 4.6 Suppose that r q and r s satisfy 



r, 



q <r s < rf 2 . (59) 



Then there exists E>i > depending only on p such that 

11^(^,^)11 = \\S(C q ,C s )\\ < B x ^ (60) 



with probability tending exponentially to 1 as f approaches oo. 

Recall that S(C q , C s ) = Ho (ye T + ez T ) — c(ee T — H). We begin by showing that A + P is 
nonsingular and, hence, y and z are uniquely determined. Let 5 := (1 —p)/(2p). Recall that 
nf = r s — Da corresponds to r s independent Bernoulli trials each succeeding with probability 
equal to p and, hence, 

„6 \ P r = 



P(nt > (1 + 5)pr s ) = P(r s - D u > (1 + 5)pr s ) < f — -— ^ I (61) 

for each i 6 C q by Theorem 14.31 Rearranging, we have that Da > (6 — 5p)r s with probability 
at least 

1 - 



(1 + 5Y 1+S ) 



for each i G C q . Similarly, 



P{n\ < (1 + 5)pr q ) = P(F tt > (9 - 5p)r q ) > 1 - ( {1 + S y 1+S) ) (62) 

for all i G C s . Therefore, by the union bound, r s — Da < (1 + S)pr s for all i G C q and 
r q — Fa < (1 + 5)pr q for all i G C s , and, hence, D, F are nonsingular with probability at 
least 

e 5 \ pTq ( e s x pTs 



J \ P' r 



>l-(r q + r s )( [1 + 6){1+S) ) . (63) 
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Moreover, applying fl2l]) shows that D + 9ee T and F + 9ee T are nonsingular and 



iip+^r'HiP-H^^, (64) 

IK^eeTI < 11^1 < («) 
with probability at least f )63|) . It follows immediately that A is nonsingular and 



WA- 1 1| = max{ || (D + 9ee T y 1 1| , || (F + 9ee 

< = (66) 

(9 — 5p) minjrq, r s } (9 — 5p)r q 

with probability at least f )63|) . 

Recall that, in the case that A is nonsingular, it suffices to prove that ||A _1 ||||P|| < 1 to 
show that A + P is nonsingular. Moreover, recall that 9 = 1 — p is chosen to ensure that the 
entries of H — 9ee T have expected value equal to 0. We can extend H — 9ee T to an r s x r s 
random matrix P with entries i.i.d. with expected value equal to by adding r s — r q rows 
with entries i.i.d. such that each additional entry takes value equal to —9 with probability 
p and value equal to p with probability 1 — p. Therefore, by Theorem 14.11 

||P|| = \\H-9ee T \\ < \\P\\ < lls /r~ s (67) 

for some 71 > depending only on p with probability at least 1 — c% exp(— c 2 r^ 3 ) where q > 
depend only on p. Combining ( )66|) . (|67|) . ( |59|) and applying the union bound shows that 



\A-H\\P\\ = <1 



(9 - Sp)r q 

with probability at least 

1 - {r q + r s ) [jY^TfiyT+sj ) ~^ exp(-c 2 r s 53 ) 

for sufficiently large r q . Therefore, A + P is nonsingular and y and z are uniquely determined 
with probability tending exponentially to 1 as f — > 00. 

For the remainder of the section we assume that A + P is nonsingular. We define Q, 
Qi, Q2, y and z as in Section I2TT1 To find an upper bound on |p(Cg, C s )\\, we decompose 
S(yC q ^Cg} as 

S(C q , C s ) = M 1 + M 2 

where M 1 := H o (ye T + ez T ) - c(ee T - H) and M 2 := H o (Q!be T + eh T Ql) 

We first obtain an upper bound on the norm of M\. We define d e R 9 to be the vector 
such that dj is the difference between the number of edges added between the node i and C s 
and the expected number of such edges for each % E C q . That is, 
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Similarly, we let i := n q c — pr q e. Note that, by our choice of d and f, we have r s I — 
D = pr s I + Diag (d) and r q I — F = pr q I + Diag (f). Notice that for 9 = 1 — p we have 
D = 9r s I — Diag (d). Expanding (TTU]) we have 



D 



-i 



9D~ 1 ee T D- 1 
1 + 9e T D~ 1 e 



bi 



D 



-i 



r l T ^—(b 1 + 9b 1 e 1 D- 1 e-9ee 1 D- 1 b 1 ) 
1 + t/e J L> i e 

= l + ^-e ' bl+ ' ) ' b - eT - eb '' g " e > 
since e T _D _1 bi = h±D~ l e. Substituting bi = c(r s e — d), where d := diag (D), we have 

b ie T - eb^ = c(ed T - de T ) 



and, hence, 



cD- 



1 + 9e T D~ 1 e 

cD- 1 



— (r s e - d + 9(ed T - de^D^e) 
r s e - d + #ee T e - 9de T D~ 1 e) 



l + ge^e (r - e + ^ e) ~ Ce 



c e + 



l + 9e T D- 1 e \9r 

c(r s + 9r q ) 
(1 + 9e T D- 1 e)9r s 



c(r s + 9r 



{l + 9e T D- 1 e)9r. 



-D~ l d 



since 



/ = — (D + Diag(d)). 



Let yi := uj\e, y 2 := fie where 



c(9r q + r s 



— c 



9[r s + r q ) 

c(9r q + r s ) 
(1 + 9e T D- 1 e)9r s 



and let 



c(9r q + r s 



c — Ui 



-D- x ± 



(1 + 9e T D- 1 e)9r s 

Hence, y = yi + y 2 + y3- Similarly, z = z 1 + z 2 + z 3 where := a; 2 e, z 2 := f 2 e where 



c(r q + 9r s 



v 2 



9(r s + r q ) 

c(r q + 9r s ) 
(1 + 9e T F- 1 e)9r c 



— c — U) 2 
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and 

= c(r q + 9r s ) ! 
3 ' (1 + ee T F- 1 e)9r q 

Therefore, we can further decompose Mi as Mi = Mi + M 2 + M 3 where 

M x : = H o ( yi e T + ezf ) - c(ee T - if), 

M 2 :=Ho (y 2 e T + ezj), M 3 := H o (y 3 e T + ezf ). 

Notice that the matrix Mx has entries corresponding to edges equal to — c and remaining 
entries equal to cp/(l — p) since 

c(l + 6)(r q + r s ) cp 
"'+^ g(r, + r.) 2C= T 

Therefore, each entry of the matrix Mi has expected value equal to 0. Moreover, each entry 
of the random block matrix M of the form 



M 



Mi 
R 



— c, with probability p 

cp/(l — p), with probability 1 — p. 



has expected value equal to if R has identically independently distributed entries such that 

Ri,j - 

Therefore, there exists ci, c 2 , c 3 , c 4 > such that 

||Mi|| < ||M|| < c 4v ^I (68) 

with probability at least 1 — Ci exp(— C2r^ 3 ) by Theorem 14.11 

Next, to obtain upper bounds on ||M 2 || and ||M 3 || we will use the following lemma. 

Lemma 4.1 There exists B > depending only on p such that 



tsC, 

and 

\ J ? ^— < R- 

,l-a/2 



ieC s 

for a = 1, 2 wift probability at least 



l-fa + r.)*;- 2(2/3)' (71) 
where v p = (e <5 /(l + 5)( 1+<5 )) p and 5 = minjp, ^/p — p}. 
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Proof: We first prove (|69|) . For each j e C q , let rij := nl The random numbers {rij : 
j 6 CJ are independent, and each is the result of r s Bernoulli trials, each with probability 
of success equal to p. We define ^ to be the event that at least one rij is very far from 
its expected value. That is, \I> is the event that there exists j G C q such that rij > tr s , 
where t = min{ v /p, 2p}. Moreover, we define ^ to be its complement, and let ip(rij) be the 
indicator function such that 



ip(n 



3) 



1, if rij < tr s 
0, otherwise. 



Let B be a positive scalar depending on p to be determined later. Then 

/ \6r B - D u \ a r q \ _ p [ v- \nj-pr a \ a r q 



< 



p \^^r^>B^- 2 ^\+P{n (72) 



r s - rii r 1 -*/ 2 



We will analyze the two terms separately. For the first term we use a technique of Bernstein 
(sec Let 6 be the indicator function of the nonnegative reals. Then. 

P > U — > B—^- A * 

I Z_-/ r n . — 1— a/2 



visa 



r s rij 7" 



p £ 1 V^- 5 ^^ 0A * i ' ) = 1 v ^ 



P 



J2 r ' " /2|n ^ PTsl - Br q > A ^(n,-) = 1 V j G C, 



V 



r s — n,- 



^U[E r ^r ,l ° -^.i-n*( 



rij, 



Let /i be a positive scalar depending p to be determined later. Notice that for any h > 
and all ieR, </>(x) < exp(hx). Thus, by the independence of the n/s, 

n% -pr s \ a . „ r g 



i 2 -— ' r s - ^ ~~ r 1 - Q / 2 



a/2-1/ \ " I I 11 ^\ n h 

r * {rs-rij) JJ j&Cq 



n g (^(M r ^_ nj) - B IHw 



A — /. 
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where 

I 01 



fj = E\e W \h\ Prsl - -B\\i,{n 3 ) 



r s [r s -n j/ 



We now analyze each /,■ individually. Fix j G C q . Then 



i=0 



£ S exp K(T^i^- B ))^- !) 

since i < tr s and, hence, i < ^Jpr s . We now reorganize this summation by considering i such 
that \i — pr s \ < */r7, then i such that ^Jr~ s < \i — pr s | < 2 v /rJ and so on. Notice that, since 
i <tr s < 2pr s , we need only to consider intervals until \i — pr s \ reaches pr s . Hence, 



k=0 i:\i-pr s \e[ky/r;,(k+l)y/r?) V V C 1 y/P) 

(k + 1) Q 

< 2 ' exp (ft " fl)) exp(-fc 2 /p) 



< b £ J £ «p(/i(g-fl))P(n, = 

fc=0 i:|i-J^ 8 |e[fc^,(fc+l)VrJ) 
LPv^J 



fc=0 

by f )54|) . Overestimating the finite sum with an infinite sum, we have 
/, < 2eM~hB) • f> P ( ^_ + ^ - A; 2 /?) . 
Choosing h such that /i < (1 — y / p)/(8p) ensures that 

hp^L _ ^ < _ fc2/(2p) 

for all r g , r s and > 1. Hence, splitting off the k = term, we have 

/j < 2 exp ^- — - hBj + 2exp(~hB) -^ex^-k 2 / (2p)). 



(73) 



Since YlkLi ex P(~ k 2 /(2p)) is dominated by a geometric series, the summation in ( 173]) is 
a finite number depending on p. Therefore, once ft, is chosen, it is possible to choose B, 
depending only on p and h, sufficiently large so that each of the two terms in ( 1731) is at 
most f/3. Therefore, we can choose h and B so that fj < 2/3 for all j 6 C q . It follows 
immediately that 

p | £ l gr '- g «l a > A § J < (2/3)^ < (2/3)'. (74) 
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To obtain a bound on the second term in (1721 . notice that the probability that rij > tr s 
is at most v p s < v p where v p = (e <5 /(l + 6y i+ ^) p by Theorem 14.31 where 5 = t/p — 1 = 
min{p, y/p — p}. Thus, applying the union bound shows that 




> B- 



l-a/2 



< (2/3)' + r q vi 



> B- 



l-a/2 



< (2/3)' + r s v;. 



Applying the union bound one last time shows that (|69|) and (ITOj) hold simultaneously with 
probability at least 1 — (r q + r s )v r p — 2(2/3) r as required. ■ 

As an immediate corollary of Lemma [4.1[ we have the following bound on \v\\ and |i^ 2 1 - 

Corollary 4.1 There exists B\ > depending only on p such that 



\vi\ + \v 2 \ < -Br 



3/2 . 3/2 
Tq + Ts 

r q + r s )(r q r s y/ 2 



with probability at least 1 — {r q + r s )v r p — 2(2/3) r . 
Proof: We begin with v±. Notice that 



c(9r q + r s 
6 



1 + 8e T D- 1 e)r s r„ + r s 



c(9r q + r s )(r q — 6r s e T D 1 e) 
er s {r q + r s ){l + 6e T D- 1 e) ' 



Moreover, 



8r.e T D x e — r n 



< 



and, since Djj < r s for all i 6 C g , we have 



r.(l + 9e T D- 1 e) > r a 1 + — M = 0rv, + r 8 . 



E 

«ec„ 



0tv 



0T. 
A, 



- 1 



E 



«ec 



|flr a - Dal 
Da 



Therefore, setting a = 1 in (1691) shows that 

c(0r q + r s )_Br g 



9r 3 s /2 (r q + r s ){l + 9e T D- 1 e) 

cB(9r q + r s )r q 
9yfr~ s (r q + r s )(6r q + r s ) 



< 



(75) 



(76) 



<B 1 - 



/r s {r q + r s 
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where B\ := B/9 and where (176|) holds with probability at least 1 — (2/3) r — r q v r p . By an 
identical calculation 

\v 2 \ < —=- f : r 77 

with probability at least 1 — (2/3) r — r s v p . Applying the union bound completes the proof. 
■ 

Observe that, as an immediate consequence of Corollary 14. II and the facts that Hoee T = 
H and ||.H"||f < y/ r g r s, we have 



|M 2 || = ||^o(y 2 e T + ez^)|| < {\ Vl \ + \v 2 \)\\H\\ F 

r 3/2 _i_ „ 3/2 

< Bl g 1_JL < 2j B 1v /^ (78) 



r g + r s 

with probability at least 1 — (r q + r s )v p — 2(2/3) r . 

The following corollary of Lemma [4.11 provides an an upper bound on 1 1 1 1 . 

Corollary 4.2 There exists B 2 depending only on p such that 

||M 3 || < \\Ho(y 3 e T + ezJ)\\<B 2 (^r- q + ^r- s ) (79) 

with probability at least 1 — (r q + r s )v p — 2(2/3) r . 

Proof: To obtain an upper bound on ||M 3 ||, we first obtain upper bounds on \\H o (y 3 e T )|| 
and \\H o (ezg)||. We begin with \\H o (y 3 e T )||. Since 



\6r s -Du\ 2 
Da 



applying ( 169|) with a = 2 and (|33|) with W = H, u = y 3 , and v = e shows that 

1/2 

\Ho(y 3 e T )\\< ( ^y 3 « 2 ||^,:) 112 ' 



1/2 



l + ee T D- 1 e)9r s 1 ^ n[ 



un\u uj 2 



c(6r a + r s 



* (TtS^<*J" ! (80) 

< fi 2v ^ (81) 

where -B2 := S/B/Q, (IHTj) follows from ( I75|) and ( 180|) holds with probability at least 1 — 
(2/3) r — r q v p . Similarly, 

\\H o (ezj)!! < (82) 
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with probability at least 1 — ((2/3) r + r s v r p ). Applying the union bound shows that 



< \\H o (y 3 e T ) \\ + \\Ho (ezf ) || < B 2 {^T q + 03 (83) 



'3 

with probability at least 1 — (r q + r s )v r p — 2(2/3) r as required. ■ 

We complete the proof of Theorem 14.61 by showing that M 2 —Ho (Q\he T + eh T Q^) 
has norm at most a constant multiple of r s j ^Jr~ q with high probability. The following lemma 
provides an upper bound on ||Qib|| and ||Q 2 l>||. 

Lemma 4.2 There exists B$, B4 and Ci > 0, i = 1, 2, 3, depending only on p such that 

,1/2 

s 



WQM < b/-^ (84) 



WQM < B j l/ \+ rl/2) (85) 

Tq 

with probability at least 

( e 5 \ pf 

1 - a exp(-c 2 f C3 ) - (r q + r s ) ^ 1 + ^ (1+a) J (86) 
where 5 = (1 — p)/{2p). 

Proof: We first derive a bound on each of ||<3i||, IIQ2II and ||b|| and consequently a bound 
on each of ||Qib|| and ||Q2b|| by applying the inequalities 

||Qib|| < ||Qi||||b|| and ||Q 2 b|| < ||Q 2 ||||b||. 

Recall that 

00 00 
\\Qi\\ < HP + Oee T )- 1 \\ ^ \\P x P 2 \\ l + \\Pi\\ ||(F + ^ee^)" 1 !! £ \\P x P 3 \\ l 

and 

00 00 
HQall < \\(F + Oee^-'W ^ \\PiP 2 \\ e + ||P 2 || \\(D + 6ee T )- 1 \\ ^ \\P x P 2 \\ l 

t=\ e=o 

where 

P x = (D + 6ee T )-\H - 6ee T ), P 2 = (F + 6ee T )-\H T - 6ee T ). 

Applying the upper bounds on \\(D + 9ee T )~ 1 \\, \\(F + 9ee T )~ 1 \\, and \\H — #ee T || given by 
(JMD, ([65D, and dS7D shows that 

\\H — (9pp t II 2 -v 2 
i|p i p 2 |i < II-" pee II < 'i / g7 \ 

1 2 ~~ (min i6Cq Ai)(min ieCs F u ) ~ (9 - Sp) 2 r q 
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with probability at least ( 186|) . Therefore, there exists 72 > depending only on p such that 



- — ' l\ 



1 ( / y 2 \ 7 



(0 - M 2 ^ 



< 72 



with probability at least f )86|) since 



00 / 2 \ * 



£=0 
and 



00 / 2 \ * 

with probability at least fl86|) in the case that r q > (71/(0 — 5p)) 2 - Similarly, there exists 
73 > depending only on p such that 

1/2, 



lfell<^-' +r ,-^) = 73(r '+ / ? ' (89) 



». ^9 1 s ' 9 1/2 

r e r 2 ry 



with probability at least ( l8bj) . Finally, recall that 

bj = c • 

Therefore, by flBTT) and fl62|) 



n?, if i E C q 
nf, Hie C s . 



v 1/2 

E(^) 2 + E«) 2 ^ (l + ^M^^ + r^ 2 
v iec q «ec s y 

with probability at least 1 — (r q + r s ) (e 5 /(l + 5)^ 1+ ^) pr . Thus, applying the union bound 
shows that there exists B 3 , £? 4 depending only on p such that 

Nn 1 I, , 72(1 + £)pc(ys) 1/2 ( r <? + r s ) 1/2 r\ /2 

r q r s ' r q 

\\r> u\\ ^ ^(1 + $)pc(r q r s ) 1/2 (r q + r s ) l / 2 (r q + r s 1/2 ) r s 1/2 (r g + r s 1/2 ) 

llVabll < ^rTf 2 < ^4 375 

ry ry »y 

with probability at least ( )86l) since (r g + rs) 1 / 2 < O^J 2 ). ■ 

Finally, to obtain an upper bound on ||M 2 || we decompose M 2 as 

M 2 = (H- 6ee T ) o (Qibe^ + 0(Qibe r ) + (H — 6ee T ) o (e(Q 2 b) T ) + #(e(Q 2 b) T ). 
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As an immediate corollary of Lemma 14.21 we have 

||(g 1 b)e T || = ||Qib||||e|| = V^sWQM < B,-^- (90) 

and 



||e(Q 2 bf || = ||e|| ||Q 2 b|| = ^||Q 2 b|| < b j1±±±AA (91) 

with probability at least (186]) . Moreover, applying ( 132]) with W = H — 8ee T , u = Qih, and 
v = e we have 

||(tf-0ee T )o(Q 1 be T )|| < \\H - 6ee T \\ \\QMoo < \\H - 9ee T \\ \\QM- (92) 
Thus, combining (|9"2"|) . (157)1 . and (154"j) we have 

||( J ff-^ee T )o(g 1 be T )|| < 537™ ( 93 ) 
with probability at least ( )86|) . Similarly, 

|| (# - £ee r ) o (e(Q 2 bf ) || < B ^Al^l (94) 

rv 



^2 1| < (95) 



with probability at least ( 1861) . Therefore, there exists c depending only on p such that 

,1/2 



with probability at least ( 186|) since r s < O(r^) and, hence (r q + r l J 2 )/r q < 0(1). 

Combining (1681) . (1781 . ( 1831) . and ( 1951) . there exists some i?i depending only on p such that 

ll&^OII <^ , m 7 {r "?w 2 ( 96 ) 

for all g, s G {1, . . . , fc}, q ^ s with probability tending exponentially to 1 as f approaches 
00. It follows that 

k k k k 9 



g=l s=l (j=l s=l 



and, hence, there exists some fix depending only on p such that 

k \ 1/2 / fc . \ 1/2 



iNi<ft(E^] (E^ 

as required. 



s=l / \g=l 9 , 
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5 Conclusions 



We have considered an NP-hard combinatorial version of the clustering problem called the 
fc-disjoint-clique problem in which input data is an undirected graph. We have shown that 
a convex relaxation of the problem can exactly solve the problem for input instances con- 
structed in a certain way. The construction of the instance is that k disjoint cliques are first 
placed in the input graph, and then many 'noise' vertices and edges are placed that obscure 
the k disjoint cliques. We have shown that the algorithm exactly recovers the clique for 
noise edges placed by an adversary provided the conditions stated in Theorem 13.11 on the 
number of noise edges are satisfied. In the case of random noise, many more noise edges 
and nodes can be tolerated compared to the adversary case; in particular, if the quantity on 
the right-hand side of Theorem 14.51 is at most f — 1, then the algorithm recovers the planted 
cliques with probability exponentially close to 1. 

This work raises several open questions. First, as already noted in the text, our bounds 
may not be the best possible. Particularly in the random case, there is nothing in the way 
of matching lower bounds. 

Another open question is whether the techniques developed herein can be applied to other 
formulations of clustering. For example, if clustering is posed as an optimization problem 
with a distance function, then can an approach like the one described here find the optimal 
solution for input instances constructed in a certain way? 
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